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Introduction 



These very short lecture notes do not want to be an exhaustive presentation of the topic, but only 
a short list of results, concepts and ideas which are useful when dealing for the first time with the 
theory of Optimal Transport. Several of these ideas have been used, and explained in deeper details, 
during the other classes of the Summer School "Optimal transportation : Theory and applications" 
which were the occasion for the redaction of these notes. The style that was chosen when preparing 
them, in view of their use during the Summer School, was highly informal and this revised version 
will respect the same style. 

The main references for the whole topic are the two books on the subject by C. Villani (|161I17|). 
For what concerns curves in the space of probability measures, the best specifically focused reference 
is [2]. Moreover, I'm also very indebted to the approach that L. Ambrosio used in a course at SNS 
Pisa in 2001/02 and I want to cite as another possible reference [I]. 

The motivation for the whole subject is the following problem proposed by Monge in 1781 ([14]): 
given two densities of mass /, g > on R d , with / / = Jg = 1, find a map T : R d -> R d push ine 
the first one onto the other, i.e. such that 

/ g(x)dx = / f(y)dy for any Borel subset A C R d (0.1) 
J A Jt^ 1 (A) 

and minimizing the quantity 

/ \T{x) — x\f(x)dx 

among all the maps satisfying this condition. This means that we have a collection of particles, 
distributed with density / on R d , that have to be moved, so that they arrange according to a new 
distribution, whose density is prescribed and is g. The movement has to be chosen so as to minimize 
the average displacement. The map T describes the movement (that we must choose in an optimal 
way), and T(x) represents the destination of the particle originally located at x. The constraint on 
T precisely accounts for the fact that we need to reconstruct the density g. In the following, we 
will always define, similarly to (10. ip . the image measure of a measure /i on X (measures will indeed 
replace the densities / and g in the most general formulation of the problem) through a measurable 
map T : X — >■ Y: it is the measure denoted by T#/i on Y and caracterized by 

T#/_i(A) = fj,(T~ 1 (A)) for every measurable set A, 

or / (f) d (7#/x) = / (j) o T dfi for every measurable function (f>. 
Jy Jx 

The problem of Monge has stayed with no solution (does a minimizer exist? how to characterize 
it?. . . ) till the progress made in the 1940s. Indeed, only with the work by Kantorovich (1942) it 
has been inserted into a suitable framework which gave the possibility to approach it and, later, 
to find that solutions actually exist and to study them. The problem has been widely generalized, 
with very general cost functions c(x,y) instead of the Euclidean distance \x — y\ and more general 
measures and spaces. For simplicity, here we will not try to present a very wide theory on generic 
metric spaces, manifolds and so on, but we will deal only with the Euclidean case. 
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1 Primal and dual problems 



In what follows we will suppose fi to be a (very often compact) domain of M. d and the cost function 
c : X f2 — > [0, +oo[ will be supposed continuous and symmetric (i.e. c(x, y) = c(y, x)). 

1.1 Kantorovich and Monge problems 

The generalization that appears as natural from the work of Kantorovich ([E]) of the problem raised 
by Monge is the following: 

Problem 1. Given two probability measures \i and v on and a cost function c : x 0, — > [0, +oo] 
we consider the problem 

(K) mini/ c d~f (7 € U(n, v) i , (1.1) 

where Il(/i,^) is the set of the so-called transport plans, i.e. Il(/i,i/) = {7 € V(Q x f2) : (p + )#7 = 
/i, (p~)#~f = 27 } where p + and p~ are the two projections of Vl x onto 0. These probability 
measures over £1 x £1 are an alternative way to describe the displacement of the particles of [i: 
instead of saying, for each x, which is the destination T(x) of the particle originally located at 
x, we say for each pair (x, y) how many particles go from x to y. It is clear that this description 
allows for more general movements, since from a single point x particles can a priori move to different 
destinations y. If multiple destinations really occur, then this movement cannot be described through 
a map T. Notice that the constraints on (p ± )#7 exactly mean that we restrict our attention to the 
movements that really take particles distributed according to the distribution fi and move them 
onto the distribution v. 

The minimizers for this problem are called optimal transport plans between \i and v. Should 7 
be of the form (id x T)#fi for a measurable map T : f2 — > Q (i.e. when no splitting of the mass 
occurs), the map T would be called optimal transport map from fi to v. 

Remark 1. It can be easily checked that if (id x T)^jjL belongs to II(/i, v) then T pushes [i onto v 
(i.e. v(A) = fi(T~ 1 (A)) for any Borel set A) and the functional takes the form J c(x,T(x))fj,(dx), 
thus generalizing Monge's problem. 

This generalized problem by Kantorovich is much easier to handle than the original one proposed 
by Monge: for instance in the Monge case we would need existence of at least a map T satisfying 
the constraints. This is not verified when [i = 5q, if v is not a single Dirac mass. On the contrary, 
there always exist transport plan in II(^, v) (for instance \x ® v G II(^, u)). Moreover, one can state 
that (K) is the relaxation of the original problem by Monge: if one considers the problem in the 
same setting, where the competitors are transport plans, but sets the functional at +00 on all the 
plans that are not of the form (id x T)^/x, then one has a functional on Il(^, u) whose relaxation is 
the functional in (K) (see [3]). 

Anyway, it is important to notice that an easy use of the Direct Method of Calculus of Variations 
(i.e. taking a minimizing sequence, saying that it is compact in some topology - here it is the weak 
convergence of probability measures - finding a limit, and proving semicontinuity (or continuity) of 
the functional we minimize, so that the limit is a minimizer) proves that a minimum does exist. 
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As a consequence, if one is interested in the problem of Monge, the question may become "does 
this minimum come from a transport map T?". Actually, if the answer to this question is yes, then 
it is evident that the problem of Monge has a solution, which also solves a wider problem, that of 
minimizing among transport plans. In some cases proving that the optimal transport plan comes 
from a transport map (or proving that there exists at least one optimal plan coming from a map) 
is equivalent to proving that the problem of Monge has a solution, since very often the infimum 
among transport plans and among transport maps is the same. Yet, in the presence of atoms, this 
is not always the case, but we will not insist any more on this degenerate case. 



1.2 Duality 

Since the problem (K) is a linear optimization under linear constraints, an important tool will be 
duality theory, which is typically used for convex problems. We will find a dual problem (D) for (K) 
and exploit the relations between dual and primal. 

The first thing we will do is finding a formal dual problem, by means of an inf-sup exchange. 

First express the constraint 7 £ n( / u, v) in the following way : notice that, if 7 is a non-negative 
measure on Q x fl, then we have 

sup / 4>dfi+ [ ^dv- I {4>{x) + ^{y))d 1 =r lf J e% " ) . 
</>,■>/> J J J l+oo otherwise 

Hence, one can remove the constraints on 7 if he adds the previous sup, since if they are satisfied 
nothing has been added and if they are not one gets +00 and this will be avoided by the minimization. 
Hence we may look at the problem we get and interchange the inf in 7 and the sup in 4>, ip: 



mm 
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Jcdj + sup^J (pdfi + J 'ifj dv — j ' {4>{x) + ip(y)) djj = 

sup / <f> dfi + / ip dv + inf / (c(x, y) — (<f)(x) + ip(y))) dry. 
</>,iji J J "i J 

Obviously it is not always possible to exchange inf and sup, and the main tool to do it is a theorem 
by Rockafellar requiring concavity in one variable, convexity in the other one, and some compactness 
assumption. We will not investigate anymore whether in this case these assumptions are satisfied 
or not. But the result is true. 

Afterwards, one can re-write the inf in 7 as a constraint on <fi and ip, since one has 



inf J {c(x,y) - (0(x) + i/j(y))) d~f = |° 
This leads to the following dual optimization problem. 



if 4>(x) + ip(y) < c(x, y) for all (x, y) £ SI x ft 
00 otherwise 
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Problem 2. Given the two probabilities /i and i/onfi and the cost function c : 1! x !J -> [0, +oo] 
we consider the problem 



(D) max < / 4>d[i + ip dv 
[Jn Jn 



<f> G L l {ii),il) G L 1 ^) : (p(x)+ip(y) < c(x,y) for all (x,y) G fixfi 

(1.2) 



This problem does not admit a straightforward existence result, since the class of admissible 
functions lacks compactness. Yet, we can better understand this problem and find existence once 
we have introduced the notion of c— transform (a kind of generalization of the well-known Legendre 
transform) . 

Definition 1. Given a function \ '■ ^ — * ^ we define its c— transform (or c— conjugate function) by 

X c (y) = inf c(x,y) - x(x). 

x&l 

Moreover, we say that a function ip is c— concave if there exists x such that ip = \ c an d we denote 
by ^> c {Vt) the set of c— concave functions. 

It is quite easy to realize that, given a pair ((f), ip) in the maximization problem (D), one can 
always replace it with (<p,(p°), and then with (<j) cc , <j) c ), and the constraints are preserved and the 
integrals increased. Actually one could go on but it is possible to prove that (p ccc = <p c for any 
function <p. This is the same as saying that ip cc = ip for any c— concave function ip, and this 
prefectly recalls what happens for the Legendre transform of convex funtions (which corresponds to 
the particular case c(x,y) = x ■ y). 

A consequence of these considerations is the following well-known result 

Proposition 1.1. We have 

mm(K) = max / ip d^L+ I ip c dv, (1-3) 

</>e* c (n) J n J n 

where the max on the right hand side is realized. In particular the minimum value of (K) is a convex 
function of (fi, v), as it is a supremum of linear functionals. 

Definition 2. The functions ip realizing the maximum in (II. 3D are called Kantorovich potentials for 
the transport from [i to v. This is in fact a small abuse, because usually this term is used only in 
the case c(x,y) = \x — y\, but it is usually understood in the general case as well. 

Notice that any c— concave function shares the same modulus of continuity of the cost c. This is 
the reason why one can prove existence for (D) (which is the same of the right hand side problem 
in the previous proposition), by applying Ascoli-Arzela's Theorem. 

In, particular, in the case c(x,y) = \x — y\ p , if SI is bounded with diameter D, any ip £ ^ C (S1) is 
pD p ~ l — Lipschitz continuous. Notice that the case where c is a power of the distance is actually of 
particular interest and two values of the exponent p are remarkable: the cases p = 1 and p = 2. In 
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these two cases we provide characterizations for the set of c— concave functions. Let us denote by 
^(p)(n) the set of c— concave functions with respect the cost c(x,y) = \x — y\ p /p. It is not difficult 
to check that 

t/j € <^=> ip is a 1-Lipschitz function; 

x 2 

if) € v I / (2)(^) =/* x i— > — i s a convex function; if = M d this is an equivalence. 

1.3 The case c(x, y) — \x — y\ 

The case c(x, y) = \x — y\ shows a lot of interesting features, even if from the point of the existence 
of an optimal map T it is one of the most difficult. A first interesting property is the following: 

Proposition 1.2. For any 1—Lipschitz function ip we have if) c = —ip. In particular, Formula \1.3\ 
may be re-written as 

min(i^) = max(D) = max / ifi d{^ — v). 

The key point of the previous proposition is proving ip c = —ip. This is easy if one considers 
that ip c {y) = inf x \x — y\ — ip{x) < —ip(x) (taking x = y), but also ip c {y) = inf x \x — y\ — ip{x) > 
inf x \x — y\ — \x — y\ + ip(y) = ip(y) (making use of the Lipschitz behaviour of ip). 

Another peculiar feature of this case is the following: 

Proposition 1.3. Consider the problem 

(B) min{M(A) A G M d (£i); V • A = /i — z/j , (1.4) 

where M(X) denotes the mass of the vector measure A and the divergence condition is to be read 
in the weak sense, with Neumann boundary conditions, i.e. — J V(p ■ d\ = J 4>d(fi — v) for any 
(j) € C 1 (r2). If Q, is convex then it holds 

mm(K) = mm(B). 

This proposition links the Monge-Kantorovich problem to a minimal flow problem which has 
been first proposed by Beckmann in [5j, under the name of continuous transportation model He 
did not know this link, as Kantorovich's theory was being developed independently almost in the 
same years. In Section 2.1 we will see some details more on this model and on the possibility of 
generalizing it to the case of distances c(x, y) coming from Riemannian metrics. In particular, in 
the case of a nonconvex Q, (B) would be equivalent to a Monge-Kantorovich problem where c is the 
geodesic distance on f2. 

To have an idea of why these equivalences between (B) and (K) hold true, one can look at the 
following considerations. 

First, a formal computation. We take the problem (B) and re- write the constraint on A by means 
of the quantity 



sup J — V0 • d\ + J 4>d{^ — v) 



if V • A = fj, - v 
+oo otherwise 



6 



Hence one can write (B) as 

mm M(A) + sup J -V<f> ■ dX + J <p d(n - v) = sup J (j) d(fi - v) + inf M(A) - J • dA, 

where inf and sup have been exchanged formally as in the previous computations. After that one 
notices that 

inf M(A) - / V0 ■ dX = inf / d|A| ( 1 - V0 • = J° if IWl " 1 

A 7 A 7 V "1^1/ I— oo otherwise 

and this leads to the dual formulation for (B) which gives 

sup / (j) d(n — v). 

<t>-. |v</>|<i 7n 

Since this problem is exactly the same as (D) (a consequence of the fact that Lip 1 functions are 
exactly those functions whose gradient is smaller than 1), this gives the equivalence between (B) 
and (K). 

Most of the considerations above, especially those on the problem (B) do not hold for costs other 
than the distance \x — y\. The only possible generalizations I know concern either a cost c which 
comes from a Riemannian distance k(x) (i.e. c(x,y) = infj/p 1 k(a(t))\a' (t)\dt : cr(0) = x,a(l) = y}, 
which gives a problem (B) with J k(x)d\X\ instead of M(A)) or the fact that p— homogeneous costs 
may become 1— homogeneous through the introduction of time as an extra variable (see [II]). Some 
more details on the problem (B) can be found in the lectures notes on "Models and applications of 
optimal transport in economics, traffic and urban planning" of this same Summer School, |15| . 



1.4 c(x,y) = h(x — y) with h strictly convex and the existence of an optimal T 

We summarize here some useful results for the case where the cost c is of the form c(x, y) = h(x — y), 
for a strictly convex function h. 

The main tool is the duality result. If we have equality between the minimum of (K) and the 
maximum of (D) and both extremal values are realized, one can consider an optimal transport plan 
7 and a Kantorovich potential tp and write 

ip(x) +tp c (y) < c(x,y) on Q, x Cl and ip(x) + ip c (y) = c(x,y) on spt7- 

The equality on spt 7 is a consequence of the inequality which is valid everywhere and of 

c dj = tp d(j,+ tp c dv = / (tp(x) + ip c (y)) dj, 



which implies equality 7— a.e. These functions being continuous, the equality passes to the support 
of the measure. 
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Once we have that, let us fix a point (xo,yo) G spt7. One may deduce from the previous 
computations that 

x i-> 0(x) — /i(x — yo) is minimal at x = xq 



and, if ip is differentiable at xo, one gets V^(xo) £ dh(xo — yo). For a strictly convex function /i one 
may inverse the relation passing to V/i* thus getting 



This solves several questions concerning the transport problem with this cost, provided ip is differen- 
tiable a.e. with respect to fi. This is usually guaranteed by requiring /i to be absolutely continuous 
with respect to the Lebsgue measure, and using the fact that ip may be proven to be Lipschitz. 
Then, one may use the previous computation to deduce that, for every xq, the point yo such that 
(xo,yo) £ spt7 is unique (i.e. 7 is of the form (id x T)#/i where T(xo) = yo). Moreover, this also 
gives uniqueness of the optimal trasport plan and of the gradient of the Kantorovich potential. 
We may summarize everything in the following theorem: 

Theorem 1.4. Given /i and v probability measures on a domain f2 C M d there exists an optimal 
transport plan tt. It is unique and of the form (id X TW/i, provided /i is absolutely continuous. 
Moreover there exists also at least a Kantorovich potential ip, and the gradient Vip is uniquely 
determined \i—a.e. (in particular ip is unique up to additive constants, provided the density of /i 
is positive a.e. on £1). The optimal transport map T and the potential ip are linked by T(x) = 
x — (V/i*)(VV ; (x)). Moreover we have ip(x) + ip c (T(x)) = c(x,T(x)) for fi—a.e. x. Conversely, 
every map T which is of the form T(x) = x — (V/i*)(V'0(x)) for a function i/j £ ^f c (^) is an optimal 
transport plan from [i to 

Remark 2. Actually, the existence of an optimal transport map is true under weaker assumptions: 
we can replace the condition of being absolutely continuous with the condition a fJ-(A) = for any 
AcM' i such that T~L d ~ 1 (A) < +00" or with any condition which ensures that the non-differentiability 
set of -0 is negligible. In the theorem we used the Lipschitz behavior of tp € ^> c and applied 
Rademacher Theorem, but c— concave functions are often more regular than only Lipschitz. 

Remark 3. In Theorem 11.41 only the part concerning the optimal map T is not symmetric in \i and 
v: hence the uniqueness of the Kantorovich potential is true even if it v (and not /i) has positive 
density a.e. (since one can retrieve tjj from ip c and viceversa). 

Remark 4. Theorem 11.41 may be particularized to the quadratic case c(x,y) = \x — y\ 2 /2, thus 
getting the existence of an optimal transport map 



for a convex function (p. By using the converse implication (sufficient optimality conditions), this 
also proves the existence and the uniqueness of a gradient of a convex function transporting /x onto v. 



xo-yo = V/i*(x ) = (dh) (xq). 




S 



This well known fact has been investigated first by Brenier (see [6]) and is often known as Brenier's 
Theorem. 

Let us moreover notice that a specific approach for the case \x — y| 2 , based on the fact that 
we can withdraw the parts of the cost depending on x or y only and maximize J x-ydj, gives the 
same result in a easier way: we actually get 4>{xq) + <p*(yo) = xq ■ yo for a convex function cp and its 
Legendre transform cj)* and we deduce yo £ dcp(xo). 

All the costs of the form c(x, y) = \x — y\ p with p > 1 fall under Theorem 1 1.41 
We finish the part dedicated to positive results by noticing that the same method may not be 
used if h is only convex, or at least does not give results as strong as what it does if h is strictly 
convex. Yet, there is anyway something which is known for the case c(x,y) = \x — y\. The results 
are a bit weaker (and much harder) and are summarized below (this is the classical Monge case and 
we refer to [3], even if several different proofs have been provided by different methods). Notice that 
a lot of literature is currently being dedicated to the case of other norms than the Euclidean one 
and other distance functions. 

Theorem 1.5. Given [i and v probability measures on a domain fi C M d there exists at least an 
optimal transport plan ir for the cost c(x,y) = \x — y\. Moreover, one of such plans is of the form 
(id x T)#n provided /i is absolutely continuous. There exists a Kantorovich potential ip, and its 
gradient is unique fi—a.e.and we have if)(x) — ip(T(x)) = \x — T(x)\ for fj,— a.e. x, for any choice of 
optimal T and if). 

Here the absolute continuity assumption is essential to have existence of an optimal transport 
map, in the sense that in general it cannot be replaced by weaker assumptions as in the strictly 
convex case. 

Morevoer, we can provide a counter-exemple showing that in general it is necessary that [i does 
not give mass to "small" sets. 

Example 1. Set 

[i = H \—A and v = 

where A, B and C are three vertical parallel segments in M? whose vertexes lie on the two line y = 
and y = 1 and the abscissas are 0, 1 and —1, respectively, and T-L 1 is the 1— dimensional Haudorff 
measure. It is clear that no transport plan may realize a cost better than 1 since, horizontally, every 
point needs to be displaced of a distance 1. Moreover, one can get a sequence of maps T n : A — > BUC 
by dividing A into In equal segments (^4i)i=i,...,2n and B and C into n segments each, (-Bj)j = i v .. jn and 
(Ci) i=i,...,n (ah ordered downwards). Then define T n as a piecewise affine map which sends An—\ 
onto Bi and A21 onto Cj. In this way the cost of the map T n is less than 1 + 1/n, which implies 
that the infimum of the Kantorovich problem is 1, as well as the infimum on transport maps only. 
Yet, no map T may obtain a cost 1, as this would imply that all points are sent horizontallybut 
this cannot respect the push-forward constraint. On the other hand, the transport plan associated 
to T n weakly converge to the transport plan ^T^fi + \T^[i, where T (x) = x ±e and e = (1,0). 
This transport plan turns out to be the only optimal transport plan and its cost is 1. 
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Notice that the same construction provides also an example of the relaxation procedure leading 
from Monge to Kantorovich. 




2 Wasserstein distances and spaces 

Starting from the values of the problem (K) in (jl.ip we can define a set of distances over P(O). For 
any p > 1 we can define 

W p (fJ,, v) = (min(if) with c(x,y) = \x — y\ p ) 1 ^ p . 
We recall that, by Duality Formula, we have 

-WP(n,u)= sup f V dv + [ ^ c dpi. (2.1) 

P ve*( P )(fi) 

Theorem 2.1. If £1 is compact, for any p > 1 the function W p is in fact a distance over V{Q) 
and the convergence with respect to this distance is equivalent to the weak convergence of probability 
measures. In particular any functional fx i— > W p (n, v) is continuous with respect to weak topology. 

To prove that the convergence according to W p is equivalent to weak convergence one first 
establish this result for p = 1, through the use of the duality with the functions in Lip^ Then it 
is possible to use the inequalities between the distances W p (see below) to extend the result to a 
general p. 

The case of a noncompact Q is a little more difficult. First, the distance must be defined only on 
a subset of the whole space of probability measures, to avoid infinite values. We will use the space 
of probabilities with finite p— th momentum: 

W p (n) = {/i£ : mp(p) := f \x\ p >(dx) < +oo}. 

Jq 

Theorem 2.2. For any p > 1 the function W p is a distance over W p (f2) and, given a measure a 
and a sequence (fJ- n )n in VVp(f2), the following are equivalent: 

• fi n — )• fi according to W p ; 
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• fin \i and m p (/%) ->■ m p {p); 

• Jq ^ ^Mn — ^ Jq'P dfJ, for any (ft £ C°(Q) whose growth is at most of order p (i.e. there exist 
constants A and B depending on (ft such that \(ft{x)\ < A + for any x). 

Notice that, as a consequence of Holder (or Jensen) inequalities, the Wasserstein distances are 
always ordered, i.e. W P1 < W P2 if p\ < p2- Reversed inequalities are possible only if $7 is bounded, 
and in this case we have, if set D = diam(f2), for p\ < P2, 

W P1 < W P2 < D l -^I^W^ /P2 . 

From the monotone behavior of Wasserstein distances with respect to p it is natural to introduce 
the distance Woo', set Woo(^) = {|i£ T^iP) : sptO) is bounded } (obviously if Q, itself is bounded 
one has Woo(^) = 'P^V}) and then 

Woo 0,i/) = inf {7 - esssup x . yeQxC |x - y\ : j ellfav)} . 

Here 7 — esssup denotes the essential sup with respect to 7, i.e. the norm in the space L 00 (fix f2; 7), 
which is the same, for continuous functions such as \x — y\, as the maximal value on the support 
of 7. It is easy to check that W p /* W m and it is interesting to study the metric space Woo(r2). 
Curiously enough, this supremal problem in optimal transport theory, even if quite natural, has not 
deserved much attention, up to the very recent paper [TO] . 

The Woo convergence is stronger than any W p convergence and hence also than the weak conver- 
gence of probability measures. The converse is not true and the convergence in W^, turns out to be 
actually rare: consequently there is a great lack of compactness in Woo- For instance it is not difficult 
to check that, if we set \it = t8 XQ + (1 — t)8 Xl , where xq 7^ x\ € f2, we have WoaOtjA 4 *) = \xq — x\\ 
if t ^ s. This implies that the balls B(fit, \xq — xi\/2) are infinitely many disjoint balls in Woo and 
prevents compactness. 

The following statement summarizes the compactness properties of the spaces W p for 1 < p < 00 
and its proof is a direct application of the considerations above and of Theorem 12.21 

Proposition 2.3. For 1 < p < 00 the space W p (£l) is compact if and only if f2 itself is compact. 
Moreover, for an unbounded Q the space W p (f2) is not even locally compact. The space Woo(^) is 
neither compact nor locally compact for any choice of VL with #Q > 1. 

3 Geodesies, continuity equation and displacement convexity 
3.1 Metric derivatives in Wasserstein spaces 

We are concerned in this sections with several properties linked to the curves in the Wasserstein 
space W p . For this subject the main reference is [2]. Before giving the main result we are interested 
in, we recall the definition of metric derivative, which is a concept that may be useful when studying 
curves which are valued in generic metric spaces. 
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Definition 3. Given a metric space (X, d) and a curve 7 : [0, 1] — > X we define metric derivative 
of the curve 7 at time t the quantity 



IVIM-Bm (3.1) 



provided the limit exists. 



As a consequence of Rademacher Theorem it can be seen (see [J]) that for any Lipschitz curve 
the metric derivative exists at almost every point t £ [0, 1]. We will be concerned quite often with 
metric derivatives of curves which are valued in the space W p ($7). 

Definition 4. If we are given a Lipschitz curve ft : [0, 1] — > W p (£l), we define velocity field of the 
curve any vector field v : [0, 1] x — > M. d such that for a.e. t £ [0, 1] the vector field vt = v(t, •) 
belongs to [L p (fit)] d and the continuity equation 

-fl t + V • (v ■ fit) = 

is satisfied in the sense of distributions: this means that for all (j) G Cc(^) an d any ti < £2 S [0, 1] 
it holds ^ 

) dn t2 - I 4> d\x tl = ds I V(j)-v s d/j s , 
Jn Jt x Jn 

or, equivalently, in differential form: 

— f (j) dfi t = [ • Vt dm for a.e. t £ [0, 1]. 
dt Jn Jn 

We say that v is the tangent field to the curve fit if, for a.e. t, Vt has minimal [L p (fi t )] d norm for 
any t among all the velocity fields (actually this is not the true definition of a tangent vector field, 
since this would involve the definition of a tangent space for the "manifold" W p , but it is in this 
case the same). 

The following proposition is concerned with the existence of tangent fields and comes from 
Theorem 8.3.1 and Proposition 8.4.5 in [2]. 

Theorem 3.1. If p > 1 and ft = {fit)t is a curve in Lip([0, 1]; W p (£l)) then there exists a unique 
vector field v characterized by 

J^u + V >■//) = (), (3.2) 

\\vt\\LP( Mt ) < W\(t) for a.e. t, (3.3) 

where the continuity equation is satisfied in the sense of distributions as previously explained. More- 
over, if (j3.2|) holds for a family of vector fields {v t )t with | |f * | |z,p(/x t ) ^ C then fi G Lip([0, 1]; W p (fl)) 
and \ft'\(t) < \\vt\\LP(n t ) for a.e. t. 
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To have an idea of the meaning of the previous theorem and of the relationship between curves 
of measures and the continuity equation some considerations could be useful. 

Actually, at least when the vector fields vt are regular enough, the solution of the continuity 
equation dfi/dt + V ■ (v ■ fi) = 0, are obtained by taking the images of the initial measure through 
the maps a(t, •) obtained by taking the solution of 

a'(t,x) = v t (a(t,x)), 
cr(0, x) = x. 

This explains why the vector field vt is called "velocity field" of the curve fit- if every particle follows 
at each time t the velocity field vt, then the position of all the particles at time t reconstructs exactly 
the measure fit that appears in the continuity equation together with vt ! 

Think for a while to the case of two time steps only: there are two measures fit and fit+h and 
there are several ways for moving the particles so as to reconstruct the latter from the former. It 
is exactly as when we look for a transport. One of these trasnports is optimal in the sense that it 
minimizes J \T(x) — x\ p fit(dx) and the value of this integral equals W p (fit, fit+h)- If we call vt(x) 
the "discrete velocity of the particle located at x at time t, i.e. vt(x) = (T(x) — x)/h, one has 
ll^tll-WGut) = \W p (fit, fit+h)- The result of the previous theorem may be easily guessed as obtainable 
as a limit as h -+ 0. 

3.2 Geodesies and geodesic convexity 

Once we know about curves in their generality, it is interesting to think about geodesies. The 
following result is a characterization of geodesies in W P (Q) when fi is a convex domain in M. d . This 
procedure is also known as McCann's linear interpolation. 

Theorem 3.2. All the spaces W P (Q) are length spaces and if fi and v belong to W p (fi), and 7 is 
an optimal transport plan from fi to v for the cost c p (x,y) = \x — y\ p , then the curve 

/i 7 (s) = (p s )#7 

where p s : fi x £1 — > is given by p s (x, y) = x + s(y — x), is a constant- speed geodesic from fi to v. 
In the case p > 1 all the constant- speed geodesies are of this form, and if fi is absolutely continuous, 
then there is only one geodesic and it has the form 

K s ) = [(! - s)id + sT] # fi, 

where T is the optimal transport map from fi to v. 

By means of this characterization of geodesies we can also define the useful concept of displace- 
ment convexity introduced by McCann in |13| . 

Definition 5. Given a functional F : W P (Q) n L 1 — > [0, +00], we say that it is displacement convex 
if all the maps t 1— > F(fC l (t)) are convex on [0, 1] for every choice of fi and v in W p (fJ) and 7 optimal 
transport plan from fi to v with respect to c(x,y) = \x — y\ p . 
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The following well-known result provides a wide set of displacement convex functionals. In the 
case p = 2 this result is due to McCann ([13]), while the generalization to any p can be found in [2]. 

Theorem 3.3. Consider the following functionals on the space W p (fl), where $7 is any convex subset 
ofR N : 



j\n) = 

J 2 (y) = I V{x)n{dx 



fnf(u(x))dx ifn = u-C d 

+oo if n is not absolutely continuous; 



j3 0") = / / w ( x ~ y)/J'{dx)iJ,(dy). 
Jn Jn 

Suppose that f : [0, +oo] — > [0, +oo] is a convex and superlinear lower semicontinuous function 
with /(0) = 0, and that V : Q, — > [0, +oo] and w : M. d — y [0, +oo] are convex functions. Then 
the functionals J 2 and J 3 are displacement convex in W p (£l) and the functional J 1 is displacement 
convex provided the map 

r h-> r d f{r~ d ) 

is convex and non-increasing on ]0, +oo[. 



4 Monge- Ampere equation and regularity 

The final issue that we'll approach in these lecture notes will be concerned with some regularity 
properties of T and ip (the optimal transport map and the Kantorovich potential, respectively) and 
their relations with the densities of \i and v. We will consider only the quadratic case c(x, y) = 
\x — y| 2 /2, because it is the one where more results have been proven. Very recent results for generic 
costs have been developed by Ma, Trudinger, Wang, Loeper, Figalli. . . They require some very rigid 
assumptions on the costs, so that, surprinsingly enough, the quadratic cost is one of the few power 
that satisfies the suitable hypotheses. 

It is easy - just by a change-of- variables formula - to transform the equality v = T#/j, into 
the PDE v(T(x)) = u(x)/\JT\(x), where u and v are the densities of p, and v (which have to be 
supposed regular enough) and J denotes the determinant of the Jacobian matrix. Recalling that we 
may write T = Vc/> with <f> convex (Remark H]), we get the Monge- Ampere equation 

where M denotes the determinant of the Hessian 

d 2 4> 



M4> = det H(j> = det 



dxi dxj 



This equation up to now is satisfied by <f> = ^ — ip in a formal way only. We define various notions 
of solutions for (j4.1f) : 
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• we say that 4> satisfies (|4.ip in the Brenier sense if (V</>)#ii • C d = v ■ C d (and this is actually 
the sense to be given to this equation); 

• we say that <p satisfies (|4.ip in the Alexandroff sense if Hcj), which is always a positive measure 
for <f) convex, is absolutely continuous and its density satisfies (14.ip a.e.; 

• we say that <p satisfies (|4.1|) in the viscosity sense if it satisfies the usual comparison properties 
required by viscosity theory but restricting the comparisons to regular convex test functions 
(since M is in fact monotone just when restricted to positively definite matrices); 

• we say that <p satisfies (|4.ip in the classical sense if it is of class C 2 and the equation holds 
pointwise. 

Notice that any notion except the first may be also applied to the more general equation M(f> = /, 
while the first one just applies to this specific transportation case. The results we want to use are 
well summarized in Theorem 50 of |16j : 

Theorem 4.1. If u and v are C°' a (Q,) and are both bounded from above and from below on the 
whole by positive constants and Q is a convex open set, then the unique Brenier solution (j) of 
(14, ip belongs to C 2,a (Q) n C 1,a (Q) and 4> satisfies the equation in the classical sense (hence also in 
the Alexandroff and viscosity senses). 

Even if this precise statement is taken from [16] , we just detail a possible bibliographical path to 
arrive at this result. It is not easy to deal with Brenier solutions, so the idea is to consider viscosity 
solutions, for which it is in general easy to prove existence by Perron's method. Then prove some 
regularity result on viscosity solutions, up to getting a classical solution. After that, once we have 
a classical convex solution to Monge-Ampere equation, this will be a Brenier solution too. Since 
this is unique (up to additive constants) we have got a regularity statement for Brenier solutions. 
We can find results on viscosity solutions in [7J, [9] and [8]. In [7J some conditions to ensure strict 
convexity of the solution of M<j) = f when / is bounded from above and below are given. In [9] 
for the same equation it is proved C ,a regularity provided we have strict convexity. In this way 
the term u/v(Vcp) becomes a C°' a function and in [8] it is proved C 2,a regularity for solutions of 
M(f) = f with / € C°' a . 
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